While other review papers have used common metrics such as mean absolute error (MAE) and percent error, these cannot be applied due to the large heterogeneity in datasets, devices, and algorithms used in each study. For example, a study with a dataset with small BP distribution may yield better results, as there is less variation that the model needs to explain. Here, we provide a brief summary of the statistics to account for heterogeneity in dataset distributions and calibration technique between studies and provide analysis with interactive visuals. For specific details related to methodology, see Methods and our paper. Please see the Wearable BP on GitHub for code to reproduce Systematic Review results and visuals. To access the raw data used in our systematic review, please fill out this form.
Metadata Statistics
To understand the distribution of extracted parameters, we create visualize the distribution of the different study parameters stratified by calibration technique.
Summary Statistics
BP Distribution vs Error Distribution
To determine whether the BP distribution affects the error we regress the standard deviation of the errors on the standard deviation of the BP distributions.
We find that lower error is harder to achieve for larger BP distributions and confirms that BP distribution is a confounding factor in accuracy for subject split. Therefore, we must take BP distribution into account in analysis.
Time between calibration and test vs Error Distribution
To determine whether the accuracy of the personalization changes over time, we regress the error on the time between calibration and test (∆t). For this, the time increments were binned into seconds (∆t=[0, 1)), minutes (∆t=[1, 2)), hours (∆t=[2, 3)), days (∆t=[3, 4)), and months (∆t=[4, 5)). Based on the number of intervals between each bin, we assign each ∆t to a value in the interval. For example, 2 weeks (14 days) will have a value of 3+14/30=3.47 because there are approximately 30 days in each month.
Based on the regression lines, there is no significant relation between the accuracy of personalization studies and ∆t. However, it is interesting to note that visually, there is a faint increasing trend.
Explained Deviation
In addition to the performance criteria stated in the standards, we condition our performance metric based on the BP distribution, which is a significant source of heterogeneity between studies. To accomplish this, we adopt an interpretable statistic coined Explained Deviation (ED). For subject split, ED is used to compute the effectiveness of a device with estimation error parameterized by (\(\mu_\epsilon, \sigma_\epsilon\)) on a population parameterized by (\(\mu_{pop}\), \(\sigma_{pop}\)), where $\mu$ represents the mean and \(\sigma\) represents the standard deviation. The ED can be computed as \(ED = \frac{\sigma_{pop}}{\sigma_\epsilon}\). The higher the ED, the better the system performs. On the other hand, an ED of 1 indicates that the estimator performs no better than an estimator that predicts a constant value. Intuitively, this statistic determines how much the model “explains” the BP distribution. Statistically, ED is similar to an F-test for the ratio of two variances with equal sample size, the null hypothesis \(H_0: ED \leq ED_{min}\), and the alternative hypothesis \(H_1: ED > ED_{min}\) where \(ED_{min}\) is the computed minimum Explained Deviation that meets the standards (See Supplementary 1). One limitation of using ED is the assumptions of zero bias and normality, which are not always the case. However, this gives a good ballpark estimate in practice.
Furthermore, we can compute confidence intervals using the F-distribution by determining the bounds \(P(\sqrt{F_{\alpha/2}(n-1, n-1)} ED_{est} \leq ED_{true} \leq \sqrt{F_{1-\alpha/2}(n-1, n-1)} ED_{est}) = 1-\alpha\) where \(\alpha\) is the level of significance, \(P\) are the probability, \(F\) is the F-distribution and \(n\) in the sample size (Supplementary 1). The Margin of Error is half the Confidence Interval. Using the same formulation, we compute ED for personalization. However, in this case, we must take the change in BP of each subject into account: \(ED = \frac{\sigma_{\Delta BP, err}}{\sigma_{\Delta BP}}\), where \(\sigma_{\Delta BP, err}\) is the BP change error of the study cohort and \(\sigma_{\Delta BP}\) is the standard deviation of the BP changes of the study cohort. Finally, to determine whether the sample size is sufficient to detect the reported result, we compute power (\(p\)). To do this we define \(\alpha=0.05\), \(\beta=0.02\), and effect size \(ES=\frac{5}{8}\), based on ANSI/AAMI/ISO 81060-2:2019. This corresponds to \(p=0.98\) and a sample size of approximately 85 (Supplementary 2).
Estimating required Explained Deviation from AAMI/ANSI/ISO 81060-2:2019
The AAMI/ANSI/ISO Standard specifies that a device that for the general population should:
- Have n=85 subjects, should include ≥30% males and ≥30% females
- Have ≥5% of the reference SBP readings ≤100 mm Hg, ≥5% with ≥160 mmHg, and ≥20% with ≥140 mmHg
- Have ≥5% of reference DBP readings ≤60 mmHg, ≥5% with ≥100 mmHg, and ≥20% with ≥85 mm Hg
Currently, there are no publicly available datasets that satisfy these demographics. However, the PPG-BP can meet these requirements if it is subsampled. To determine a subsample that satisfied the AAMI/ANSI/ISO Standards, we performed weighted sampling of subjects, where the weights were determined using Iterative Proportional Fitting (IPF) marginalized on the requirements. We repeat this process 10000 times and determine the minimum variance of the subsampled datasets. Then, we took the maximum error standard deviation allowed by AAMI/ANSI/ISO (±8mmHg). The minimum ED for SBP and DBP was computed to be 2.17 and 1.39. The implementation can be found on Github.
DBP Explained Deviation vs SBP Explained Deviation
Using our proposed metric to evaluate estimation accuracy, we plot the ED of SBP versus ED of DBP for subject split and personalization studies. We indicate the size of points using the power of the study and also report important study information using a hover tool. Moreover, we delineate in green the minimum ED (\(ED_{min}\)), computed from the AAMI/ANSI/ISO Standards for SBP and DBP, which were 2.17 and 1.39 respectively.
BP Distributions vs Explained Deviations
To determine whether there is significant publication bias, we regress Explained Deviations on the standard deviation of the BP distributions.
We can infer from this that there is significant publication bias. Studies with small distributions tend to not explain data well, yet are published. In contrast, studies with large distributions tend to explain the data better. This may be attributed to the error requirements specified by the different BP device standards. A study with a large BP distribution will need to have a better system to attain errors lower than specified.
Supplementary 1: Explained Deviation Confidence Interval
Assume:
- \(n_1\) in dependent observations from a normally distributed popoulation with variance \(\sigma_1^2\) and
- \(n_2\) independent observations from a normall distributed population with variance \(\sigma_2^2\).
Given sample variance \(s_1^2\) and \(s_2^2\), we can write \(\frac{(n_1 - 1)s_1^2}{\sigma_1} \sim \chi_{n_1-1}^2\) and \(\frac{(n_2 - 1)s_2^2}{\sigma_2} \sim \chi_{n_2-1}^2\)
The ratio of these two distributions is a F-distribution and can be written in the form \(F=\frac{s_1^2/\sigma_1^2}{s_2^2/\sigma_2^2}\)
The \(1-\alpha\) confidence interval can be written as \(P(F_{\alpha/2}(n_1-1, n_2-1) \lt \frac{s_1^2/\sigma_1^2}{s_2^2/\sigma_2^2} \lt F_{1-\alpha/2}(n_1-1, n_2-1)\)
Simplying the expression, using the identity \(F_{1-\alpha/2}(n_1-1, n_2-1) = \frac{1}{F_{\alpha/2}(n_2-1, n_1-1)}\), and substituting \(\frac{s_1}{s_2}=ED_{est}\) and \(\frac{\sigma_1}{\sigma_2}=ED_{true}\) gives \(P(\sqrt{F_{\alpha/2}(n-1, n-1)} ED_{est} < ED_{true} < \sqrt{F_{1-\alpha/2}(n-1, n-1)} ED_{est}) = 1-\alpha\)
Supplementary 2: Power Analysis
Compute \(p=1-\beta=\phi(ES \sqrt{\frac{n}{2}}-z_{1-\alpha/2}))\) where \(ES\) (effect size) is given by error bias/error standard deviation=\(\frac{5}{8}\), \(\alpha\) is the selected level of significance, \(\beta=1-P\), and \(\phi\) is the cumulative distribution function of a normal distribution